essaMEM: finding maximal exact matches using enhanced sparse suffix arrays

نویسندگان

  • Michaël Vyverman
  • Bernard De Baets
  • Veerle Fack
  • Peter Dawyndt
چکیده

We have developed essaMEM, a tool for finding maximal exact matches that can be used in genome comparison and read mapping. essaMEM enhances an existing sparse suffix array implementation with a sparse child array. Tests indicate that the enhanced algorithm for finding maximal exact matches is much faster, while maintaining the same memory footprint. In this way, sparse suffix arrays remain competitive with the more complex compressed suffix arrays.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A practical algorithm for finding maximal exact matches in large sequence datasets using sparse suffix arrays

MOTIVATION High-throughput sequencing technologies place ever increasing demands on existing algorithms for sequence analysis. Algorithms for computing maximal exact matches (MEMs) between sequences appear in two contexts where high-throughput sequencing will vastly increase the volume of sequence data: (i) seeding alignments of high-throughput reads for genome assembly and (ii) designating anc...

متن کامل

Accurate Long Read Mapping using Enhanced Suffix Arrays

With the rise of high throughput sequencing, new programs have been developed for dealing with the alignment of a huge amount of short read data to reference genomes. Recent developments in sequencing technology allow longer reads, but the mappers for short reads are not suited for reads of several hundreds of base pairs. We propose an algorithm for mapping longer reads, which is based on chain...

متن کامل

E-MEM: efficient computation of maximal exact matches for very large genomes

MOTIVATION Alignment of similar whole genomes is often performed using anchors given by the maximal exact matches (MEMs) between their sequences. In spite of significant amount of research on this problem, the computation of MEMs for large genomes remains a challenging problem. The leading current algorithms employ full text indexes, the sparse suffix array giving the best results. Still, their...

متن کامل

andi: Fast and accurate estimation of evolutionary distances between closely related genomes

MOTIVATION A standard approach to classifying sets of genomes is to calculate their pairwise distances. This is difficult for large samples. We have therefore developed an algorithm for rapidly computing the evolutionary distances between closely related genomes. RESULTS Our distance measure is based on ungapped local alignments that we anchor through pairs of maximal unique matches of a mini...

متن کامل

The Enhanced Suffix Array and Its Applications to Genome Analysis

In large scale applications as computational genome analysis, the space requirement of the suffix tree is a severe drawback. In this paper, we present a uniform framework that enables us to systematically replace every string processing algorithm that is based on a bottomup traversal of a suffix tree by a corresponding algorithm based on an enhanced suffix array (a suffix array enhanced with th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 29 6  شماره 

صفحات  -

تاریخ انتشار 2013